Khobar
Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora
Hennara, Khalil, Bastati, Ahmad, Hreden, Muhammad, Hamed, Mohamed Motasim, Aldallal, Zeina, Chrouf, Sara, AlModhayan, Safwan
The performance of large language models (LLMs) and large multimodal models (LMMs) depends heavily on the quality and scale of their pre-training datasets. Recent research shows that large multimodal models trained on natural documents where images and text are interleaved outperform those trained only on image-text pairs across a wide range of benchmarks, leveraging advanced pre-trained models to enforce semantic alignment, image-sequence consistency, and textual coherence. For Arabic, however, the lack of high-quality multimodal datasets that preserve document structure has limited progress. In this paper, we present our pipeline Wasm for processing the Common Crawl dataset to create a new Arabic multimodal dataset that uniquely provides markdown output. Unlike existing Arabic corpora that focus solely on text extraction, our approach preserves the structural integrity of web content while maintaining flexibility for both text-only and multimodal pre-training scenarios. We provide a comprehensive comparative analysis of our data processing pipeline against those used for major existing datasets, highlighting the convergences in filtering strategies and justifying our specific design choices. To support future research, we publicly release a representative dataset dump along with the multimodal processing pipeline for Arabic.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Middle East > Saudi Arabia > Eastern Province > Khobar (0.04)
- Asia > Middle East > Jordan (0.04)
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
Hennara, Khalil, Hreden, Muhammad, Hamed, Mohamed Motasim, Bastati, Ahmad, Aldallal, Zeina, Chrouf, Sara, AlModhayan, Safwan
Arabic document OCR remains a challenging task due to the language's cursive script, diverse fonts, diacritics, and right-to-left orientation. While modern Multimodal Large Language Models (MLLMs) have advanced document understanding for high-resource languages, their performance on Arabic remains limited. In this work, we introduce Baseer, a vision-language model fine-tuned specifically for Arabic document OCR. Leveraging a large-scale dataset combining synthetic and real-world documents, Baseer is trained using a decoder-only fine-tuning strategy to adapt a pre-trained MLLM while preserving general visual features. We also present Misraj-DocOCR, a high-quality, expert-verified benchmark designed for rigorous evaluation of Arabic OCR systems. Our experiments show that Baseer significantly outperforms existing open-source and commercial solutions, achieving a WER of 0.25 and establishing a new state-of-the-art in the domain of Arabic document OCR. Our results highlight the benefits of domain-specific adaptation of general-purpose MLLMs and establish a strong baseline for high-accuracy OCR on morphologically rich languages like Arabic.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Asia > Middle East > Saudi Arabia > Eastern Province > Khobar (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
- Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
EMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning
Chen, Yinda, He, Yangfan, Yang, Jing, Zhang, Dapeng, Yuan, Zhenlong, Khan, Muhammad Attique, Baili, Jamel, Yee, Por Lip
Prompt engineering significantly influences the reliability and clinical utility of Large Language Models (LLMs) in medical applications. Current optimization approaches inadequately address domain-specific medical knowledge and safety requirements. This paper introduces EMPOWER, a novel evolutionary framework that enhances medical prompt quality through specialized representation learning, multi-dimensional evaluation, and structure-preserving algorithms. Our methodology incorporates: (1) a medical terminology attention mechanism, (2) a comprehensive assessment architecture evaluating clarity, specificity, clinical relevance, and factual accuracy, (3) a component-level evolutionary algorithm preserving clinical reasoning integrity, and (4) a semantic verification module ensuring adherence to medical knowledge. Evaluation across diagnostic, therapeutic, and educational tasks demonstrates significant improvements: 24.7% reduction in factually incorrect content, 19.6% enhancement in domain specificity, and 15.3% higher clinician preference in blinded evaluations. The framework addresses critical challenges in developing clinically appropriate prompts, facilitating more responsible integration of LLMs into healthcare settings.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- North America > Dominican Republic (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (8 more...)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Health Care Providers & Services (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Sadeed: Advancing Arabic Diacritization Through Small Language Model
Aldallal, Zeina, Chrouf, Sara, Hennara, Khalil, Hamed, Mohamed Motaism, Hreden, Muhammad, AlModhayan, Safwan
Arabic text diacritization remains a persistent challenge in natural language processing due to the language's morphological richness. In this paper, we introduce Sadeed, a novel approach based on a fine-tuned decoder-only language model adapted from Kuwain 1.5B Hennara et al. [2025], a compact model originally trained on diverse Arabic corpora. Sadeed is fine-tuned on carefully curated, high-quality diacritized datasets, constructed through a rigorous data-cleaning and normalization pipeline. Despite utilizing modest computational resources, Sadeed achieves competitive results compared to proprietary large language models and outperforms traditional models trained on similar domains. Additionally, we highlight key limitations in current benchmarking practices for Arabic diacritization. To address these issues, we introduce SadeedDiac-25, a new benchmark designed to enable fairer and more comprehensive evaluation across diverse text genres and complexity levels. Together, Sadeed and SadeedDiac-25 provide a robust foundation for advancing Arabic NLP applications, including machine translation, text-to-speech, and language learning tools.
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (4 more...)
- Research Report > New Finding (0.93)
- Research Report > Promising Solution (0.66)
Kuwain 1.5B: An Arabic SLM via Language Injection
Hennara, Khalil, Chrouf, Sara, Hamed, Mohamed Motaism, Aldallal, Zeina, Hadid, Omar, AlModhayan, Safwan
Enhancing existing models with new knowledge is a crucial aspect of AI development. This paper introduces a novel method for integrating a new language into a large language model (LLM). Our approach successfully incorporates a previously unseen target language into an existing LLM without compromising its prior knowledge. We trained a tiny model with 1.5 billion parameters named Kuwain by injecting the Arabic language into a small open-source model mainly trained in English. Our method demonstrates significant improvements in Arabic language performance, with an average 8% improvement across various benchmarks, while retaining the model's existing knowledge with a minimum amount of the original model's data. This offers a cost-effective alternative to training a comprehensive model in both English and Arabic. The results highlight the potential for efficient, targeted language model expansion without extensive retraining or resource-intensive processes.
- North America > United States (0.04)
- Asia > Middle East > Saudi Arabia > Eastern Province > Khobar (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > New Finding (1.00)
- Research Report > Promising Solution (0.86)
Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model
Hennara, Khalil, Hreden, Muhammad, Hamed, Mohamed Motaism, Aldallal, Zeina, Chrouf, Sara, AlModhayan, Safwan
We introduce Mutarjim, a compact yet powerful language model for bidirectional Arabic-English translation. While large-scale LLMs have shown impressive progress in natural language processing tasks, including machine translation, smaller models. Leveraging this insight, we developed Mutarjim based on Kuwain-1.5B , a language model tailored for both Arabic and English. Despite its modest size, Mutarjim outperforms much larger models on several established benchmarks, achieved through an optimized two-phase training approach and a carefully curated, high-quality training corpus.. Experimental results show that Mutarjim rivals models up to 20 times larger while significantly reducing computational costs and training requirements. We also introduce Tarjama-25, a new benchmark designed to overcome limitations in existing Arabic-English benchmarking datasets, such as domain narrowness, short sentence lengths, and English-source bias. Tarjama-25 comprises 5,000 expert-reviewed sentence pairs and spans a wide range of domains, offering a more comprehensive and balanced evaluation framework. Notably, Mutarjim achieves state-of-the-art performance on the English-to-Arabic task in Tarjama-25, surpassing even significantly larger and proprietary models like GPT-4o mini. We publicly release Tarjama-25 to support future research and advance the evaluation of Arabic-English translation systems.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Latvia > Riga Municipality > Riga (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Asia > Middle East > Saudi Arabia > Eastern Province > Khobar (0.04)
QDCNN: Quantum Deep Learning for Enhancing Safety and Reliability in Autonomous Transportation Systems
Meghanath, Ashtakala, Das, Subham, Behera, Bikash K., Khan, Muhammad Attique, Al-Kuwari, Saif, Farouk, Ahmed
In transportation cyber-physical systems (CPS), ensuring safety and reliability in real-time decision-making is essential for successfully deploying autonomous vehicles and intelligent transportation networks. However, these systems face significant challenges, such as computational complexity and the ability to handle ambiguous inputs like shadows in complex environments. This paper introduces a Quantum Deep Convolutional Neural Network (QDCNN) designed to enhance the safety and reliability of CPS in transportation by leveraging quantum algorithms. At the core of QDCNN is the UU{\dag} method, which is utilized to improve shadow detection through a propagation algorithm that trains the centroid value with preprocessing and postprocessing operations to classify shadow regions in images accurately. The proposed QDCNN is evaluated on three datasets on normal conditions and one road affected by rain to test its robustness. It outperforms existing methods in terms of computational efficiency, achieving a shadow detection time of just 0.0049352 seconds, faster than classical algorithms like intensity-based thresholding (0.03 seconds), chromaticity-based shadow detection (1.47 seconds), and local binary pattern techniques (2.05 seconds). This remarkable speed, superior accuracy, and noise resilience demonstrate the key factors for safe navigation in autonomous transportation in real-time. This research demonstrates the potential of quantum-enhanced models in addressing critical limitations of classical methods, contributing to more dependable and robust autonomous transportation systems within the CPS framework.
- Asia > Middle East > Qatar (0.14)
- Asia > India (0.14)
- Africa > Middle East > Egypt (0.14)
- Asia > Middle East > Saudi Arabia > Eastern Province > Khobar (0.14)
- Transportation > Infrastructure & Services (1.00)
- Information Technology > Robotics & Automation (0.70)
- Transportation > Passenger (0.70)
- Transportation > Ground > Road (0.49)
Revolutionizing Communication with Deep Learning and XAI for Enhanced Arabic Sign Language Recognition
Balat, Mazen, Awaad, Rewaa, Zaky, Ahmed B., Aly, Salah A.
This study introduces an integrated approach to recognizing Arabic Sign Language (ArSL) using state-of-the-art deep learning models such as MobileNetV3, ResNet50, and EfficientNet-B2. These models are further enhanced by explainable AI (XAI) techniques to boost interpretability. The ArSL2018 and RGB Arabic Alphabets Sign Language (AASL) datasets are employed, with EfficientNet-B2 achieving peak accuracies of 99.48\% and 98.99\%, respectively. Key innovations include sophisticated data augmentation methods to mitigate class imbalance, implementation of stratified 5-fold cross-validation for better generalization, and the use of Grad-CAM for clear model decision transparency. The proposed system not only sets new benchmarks in recognition accuracy but also emphasizes interpretability, making it suitable for applications in healthcare, education, and inclusive communication technologies.
- Asia > Middle East > Saudi Arabia > Eastern Province > Khobar (0.14)
- Asia > Middle East > Kuwait > Ahmadi Governorate > Al Ahmadi (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Africa > Middle East > Egypt > Alexandria Governorate > Alexandria (0.04)
- Health & Medicine (1.00)
- Education > Curriculum > Subject-Specific Education (0.97)
Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models
Balat, Mazen, Awaad, Rewaa, Adel, Hend, Zaky, Ahmed B., Aly, Salah A.
This paper presents an Arabic Alphabet Sign Language recognition approach, using deep learning methods in conjunction with transfer learning and transformer-based models. We study the performance of the different variants on two publicly available datasets, namely ArSL2018 and AASL. This task will make full use of state-of-the-art CNN architectures like ResNet50, MobileNetV2, and EfficientNetB7, and the latest transformer models such as Google ViT and Microsoft Swin Transformer. These pre-trained models have been fine-tuned on the above datasets in an attempt to capture some unique features of Arabic sign language motions. Experimental results present evidence that the suggested methodology can receive a high recognition accuracy, by up to 99.6\% and 99.43\% on ArSL2018 and AASL, respectively. That is far beyond the previously reported state-of-the-art approaches. This performance opens up even more avenues for communication that may be more accessible to Arabic-speaking deaf and hard-of-hearing, and thus encourages an inclusive society.
- Asia > Middle East > Saudi Arabia > Eastern Province > Khobar (0.14)
- Asia > Middle East > Kuwait (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Africa > Middle East > Egypt > Alexandria Governorate > Alexandria (0.04)
- Education > Curriculum > Subject-Specific Education (0.93)
- Health & Medicine (0.89)
Dates Fruit Disease Recognition using Machine Learning
Brahim, Ghassen Ben, Alghazo, Jaafar, Latif, Ghazanfar, Alnujaidi, Khalid
Many countries such as Saudi Arabia, Morocco and Tunisia are among the top exporters and consumers of palm date fruits. Date fruit production plays a major role in the economies of the date fruit exporting countries. Date fruits are susceptible to disease just like any fruit and early detection and intervention can end up saving the produce. However, with the vast farming lands, it is nearly impossible for farmers to observe date trees on a frequent basis for early disease detection. In addition, even with human observation the process is prone to human error and increases the date fruit cost. With the recent advances in computer vision, machine learning, drone technology, and other technologies; an integrated solution can be proposed for the automatic detection of date fruit disease. In this paper, a hybrid features based method with the standard classifiers is proposed based on the extraction of L*a*b color features, statistical features, and Discrete Wavelet Transform (DWT) texture features for the early detection and classification of date fruit disease. A dataset was developed for this work consisting of 871 images divided into the following classes; Healthy date, Initial stage of disease, Malnourished date, and Parasite infected. The extracted features were input to common classifiers such as the Random Forest (RF), Multilayer Perceptron (MLP), Na\"ive Bayes (NB), and Fuzzy Decision Trees (FDT). The highest average accuracy was achieved when combining the L*a*b, Statistical, and DWT Features.
- Africa > Middle East > Tunisia (0.34)
- Africa > Middle East > Morocco (0.34)
- Europe > Middle East (0.14)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)